Skip to content

Compiling Python

Existing projects

mypyc

Compiles a subset of (type-annotated) Python to C.
doc: https://mypyc.readthedocs.io/en/latest/
source: https://github.com/python/mypy/tree/master/mypyc

mycpp

https://www.oilshell.org/blog/2022/05/mycpp.html “This post is for Python experts! I briefly describe mycpp, a hacky Python-to-C++ translator I wrote on top of the MyPy type checker.”

Question: MyPy Subset with Python 3.10 Pattern Matching?

Python 3.10 was released late last year, with structural pattern matching.

One thing I found difficult when building on top of MyPy is the “inverted” visitor style. My brain doesn’t like”losing the stack” — i.e. when local variables become member variables.

I also noticed that recent versions of the AST module support type comments, which Oil still uses.

So I wonder if anybody has tried to write a strictly typed subset of MyPy with Python 3.10 pattern matching and the new ast info? I think that could be a fun project. If it’s limited to what Oil uses, it shouldn’t be too big.

Gradual typing was crucial to get Oil to where it is now, but our translator can now assume strict typing.

transonic

vision: https://fluiddyn.netlify.app/transonic-vision.html
source: https://foss.heptapod.net/fluiddyn/transonic

PPCI

https://pypi.org/project/ppci/

The PPCI (Pure Python Compiler Infrastructure) project is a compiler written entirely in the Python programming language. It contains front-ends for various programming languages as well as machine code generation functionality. With this library you can generate (working!) machine code using Python (and thus very easy to explore, extend, etc.)!

Byterun and Tailbiter

https://github.com/nedbat/byterun.git
https://github.com/darius/tailbiter.git

Byterun = Python VM in python (see also: https://aosabook.org/en/500L/a-python-interpreter-written-in-python.html)
Tailbiter = compiler from Python to bytecode, in python.

https://github.com/rocky/x-python = fork (maintained) of byterun

Core Python

https://github.com/windelbouwman/corepython = “A Python to WebAssembly compiler written in Rust.”

Features:
- Very minimal subset of the Python language. Only the core of it, nothing fancy.
- CorePython compiler itself is embeddable in browser (small WebAssembly download).

Older projects

https://mython.org/

Mython is an extensible variant of the Python programming language. Mython makes Python extensible by adding two things: parametric quotation statement, and compile-time metaprogramming. The parametric quote statement is simply syntactic sugar for saying “run some function on this embedded string”. Compile-time metaprogramming allows you to evaluate that function on the embedded string at compile time. This gives you added choice, both in terms of what your code looks like, and when you want to evaluate that code.

➜ Not sure what this really means, and if it’s useful (it was never released)
➜ There are a few research papers.
- https://dl.acm.org/doi/10.1145/1837513.1640141 “Language embedding and optimization in mython”
- http://people.cs.uchicago.edu/~jriehl/dissertation.pdf

Interesting techniques / libraries

Term rewriting

Python projects:

Article:

“Improving performance of Python code using rewriting rules technique”
http://ceur-ws.org/Vol-2866/ceur_115-125jereb11.pdf / http://pp.isofts.kiev.ua/ojs1/article/view/403

“We propose an approach to increasing performance of Python code by transforming fragments of code to more efficient languages such as Cython and C++. We use high-level algebraic models and rewriting rules technique for semi-automated code transformation. Performance-critical fragments of code are transformed into a low-level syntax model using Python parser”

Others:
- http://strategoxt.org/ (see also: https://tudelft-cs4200-2020.github.io/lectures/2020/10-transformation/CS4200-2020-10-transformation.pdf)

AST manipulation

Read this first: https://pybit.es/articles/ast-intro/

Libraries:

  • ‘astpretty’: Pretty print the output of python stdlib ast.parse.
  • ‘astunparse’: An AST unparser for Python. This is a factored out version of unparse found in the Python source distribution; under Demo/parser in Python 2 and under Tools/parser in Python 3.
  • astor (previously codegen): Python AST read/write
  • Asteria (Al2O3): Missing AST features (monkey-patches ast using astor / astpretty):
  • gast: A generic AST to represent Python2 and Python3’s Abstract Syntax Tree(AST). GAST provides a compatibility layer between the AST of various Python versions, as produced by ast.parse from the standard ast module.
  • beniget: A static analyzer for Python2 and Python3 code. Beniget provides a static over-approximation of the global and local definitions inside Python Module/Class/Function. It can also compute def-use chains from each definition.

Parsers (for other things)

  • Textx

Others

  • ‘multipledispatch’,
  • ‘jinja2’,
  • ‘numexpr’: The numexpr package supplies routines for the fast evaluation of array expressions elementwise by using a vector-based virtual machine.

Books and tutorials

Course/Book: Essentials of Compilation - An Incremental Approach in Python
https://github.com/IUCompilerCourse/Essentials-of-Compilation + https://github.com/IUCompilerCourse/python-student-support-code
➜ Interesting tutorial / book for undergraduate students. Compiles Python to machine language. Easy to read.

“Let’s Write an LLVM Specializer for Python!”
http://dev.stephendiehl.com/numpile/ and https://github.com/sdiehl/numpile
➜ Very interesing tutorial. But old.
➜ Hints at some additional ideas (including using https://github.com/sdiehl/pyrewrite and https://github.com/sdiehl/subpy)

https://codewords.recurse.com/issues/seven/dragon-taming-with-tailbiter-a-bytecode-compiler
➜ Shows some basic compilation techniques (Python AST ➜ Python Bytecode)
➜ Repo here: https://github.com/darius/tailbiter

https://arxiv.org/abs/2011.13127 “Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode”
➜ Maybe interesting (or not) for us. At least, contains references to recent papers.

https://bernsteinbear.com/blog/typed-python/ This explores whether Python’s type annotations, introduced with PEP 484, enable compiling Python to native code for speed. While types offer hints, they fall short of optimizing Python due to its dynamic and flexible nature. Key obstacles include Python’s operator dispatch complexities, heap-allocated objects, and runtime behaviors like subclassing and mutable globals. While projects like Numba, Mypyc, and Static Python optimize specific scenarios, they often require trade-offs, such as reduced dynamism or adopting modified “dialects” of Python. Despite challenges, advancements like Static Python and type-driven optimizations show promise, especially for numeric and constrained use cases, signaling a hopeful future for performance-driven Python development.

WASM

https://github.com/appcypher/awesome-wasm-langs -> Lua, mruby, grain…
https://tomassetti.me/wasi-how-to-run-webassembly-code-outside-of-your-browser/

Additional references

Benchmarks

Interesting compiler in Python

More

See also

Compilers

Testing

#mypyc

Page last modified: 2024-12-09 23:29:05